CS 294 - 1 Assignment 1 Report
نویسندگان
چکیده
Text classification has increasing potential applications in many aspects of information world, such as recommender systems and customer service. The goal of this assignment is to apply Naive Bayes classifier to a data set of labeled textual movie reviews and practice Scala/ScalaNLP. The data set “Polarity dataset v2.0” is from http://www.cs.cornell.edu/People/pabo/movie-reviewdata/, created by Bo Pang and Lillian Lee at Cornell. The reviews originally included numerical scores (−4 to +4), but they have been partitioned into positive and negative sets, and matched in size. Therefore the data contains 1000 positive and 1000 negative reviews all written before 2002.
منابع مشابه
CS 294-1: Assignment 1 Naive Bayes Classification with Improvements
The main objective of this assignment was to implement a Naive Bayes classifier and attempt certain improvements upon the vanilla version. A major challenge was to implement the classifier in Scala using the two libraries scalala and scalanlp. This report presents details regarding the different experiments I tried out, namely varying the smoothing parameter, feature selection, n-gram models an...
متن کاملCS 294-1: Assignment 2 A Large-Scale Linear Regression Sentiment Model
The primary objective of this assignment was to build a linear regression sentiment model based on amazon.com reviews. The main challenge comprised of handling moderately large amounts of data on a single machine. The different variations that I tried include the following: exact solution (L2 loss and ridge regularization), stochastic gradient with different training schemes and initialization,...
متن کاملCS 294 - 1 Assignment 2 Report
In this report, we describe our implementation of a linear regression method to classify a numerically-scored sentiment data. The dataset was collected by Mark Dredze and others at Johns Hopkins, which records 1M amazon.com book review. The linear regression classification starts with reading tokenized data and building word counts map, and then training linear classifier by minimizing error te...
متن کاملCS 294 - 1 A 1 : Naive Bayesian Classifier
Settings. Our codes were written in Scala and compiled under Simple Build Tool (SBT). The programs were run on Mac OS. We test the effectiveness of our implementation in various aspects. If not mentioned explicitly, we adopt the following default settings. We report macroaveraged F1 measures, which were further averaged by ten-fold cross validations. We consider both “Bernoulli” and “Multinomia...
متن کاملU . C . Berkeley Handout N 10 CS 294 : Pseudorandomness and Combinatorial Constructions
Today we will study some conditions under which a very powerful pseudorandom generator can be shown to exist, and also some consequences of the existence of such a pseudorandom generator. We will start by assuming the existence of a permutation p : {0, 1}n → {0, 1}n which is computable in poly(n) time and which, for some constant δ > 0, is (2δn, 2−δn)-one way. (This is an extremely strong assum...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012